iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text
نویسندگان
چکیده
Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontologybased information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relevant information from un-annotated and unstructured text. iDocument provides a pipeline architecture, an extraction template interface and the ability of exchanging domain ontologies for performing information extraction tasks. This work outlines iDocument’s ontology-based architecture, the use of SPARQL queries as extraction templates and an evaluation of iDocument in an automatic document annotation scenario.
منابع مشابه
iDocument: Using Ontologies for Extracting Information from Text
This work outlines system and usage principles of the ontology-based information extraction system iDocument. Ontology-based information extraction reuses existing domain knowledge for extracting and annotating relevant information from domain-related text. iDocument provides an architecture, an API, and a user interface for supporting users and developers in ontology based knowledge annotation...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملTowards Annotating and Extracting Textual Legal Case Elements
In common law contexts, judges and juries decide a legal case to follow previously decided cases (precedents) rather than legislation as in civil law contexts1. The set of such cases is the legal case base. Legal professionals must find, analyse, and reason with and about cases drawn from the case base in the course of arguing for a decision in a current undecided case. A range of elements of c...
متن کاملOn Ontology Based Abduction for Text Interpretation
Text interpretation can be considered as the process of extracting deep-level semantics from unstructured text documents. Deeplevel semantics represent abstract index structures that enhance the precision and recall of information retrieval tasks. In this work we discuss the use of ontologies as valuable assets to support the extraction of deep-level semantics in the context of a generic archit...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009